Improving Instruction-Level Parallelism by Exploiting Global Value Locality

نویسندگان

  • Jian Huang
  • David J. Lilja
چکیده

Several studies have pointed out that the values produced by the execution of a program’s instructions are often quite repetitive. There are basically two approaches that have been proposed for exploiting this value locality – 1) reusing the results of a prior execution of an instruction, and 2) predicting the value that will be produced by the current execution of an instruction based on the previous values it has produced. Existing value reuse and prediction schemes operate at the level of a single instruction or short sequence of instructions so that the caches used to temporarily store previous values are typically indexed using the instruction addresses. However, we have found that different instructions often produce the same values. We introduce the Speculative Value Cache (SVC) to globally exploit values previously produced by any instructions by indexing this cache with a hash function of the values of the instruction’s input operands rather than indexing with the instruction address. With this approach, the current instruction can directly use the result value found in the speculative value cache even if it has been previously produced by a different instruction. We partition the SVC into several different sections based on the instruction category, such as arithmetic, shift, or load, for instance, since different types of instructions will produce different output results when given the same input operands. Our simulation results based on the SimpleScalar simulator show that embedding the SVC in a realistic 4-issue superscalar processor could improve the performance of the SPEC95 integer benchmarks by as much as 25%. Increasing the issue-with allows this mechanism to achieve even higher performance with an 8-issue processor having speedups of 5-42% and a 16-issue processor producing speedups of up to 50%. We also demonstrate the sensitivity of these results to changes in important design parameters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Superword-Level Locality in Multimedia Extension Architectures

In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that support operations on superwords, i.e., aggregate objects consisting of several machine words. We treat the large superword register file as a compiler-controlled cache, thus avoiding unnecessary memory accesses by exploitin...

متن کامل

Compilation techniques for parallel systems

Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and eeectively exploiting parallelism at various levels of granularity. We begin by describing the program analysis techniques through w...

متن کامل

Scalable and Flexible heterogeneous multi-core system

Multi-core system has wide utility in today’s applications due to less power consumption and high performance. Many researchers are aiming at improving the performance of these systems by providing flexible multi-core architecture. Flexibility in the multi-core processors system provides high throughput for uniform parallel applications as well as high performance for more general work. This fl...

متن کامل

Compiler-Directed Static Classification of Value Locality Behavior

Predicting the values that are likely to be produced by instructions has been suggested as a way of increasing the instruction-level parallelism available in a wide-issue processor. One of the potential difficulties in exploiting the predictability of values, however, is selecting the proper type of predictor, such as a last-value predictor, a stride predictor, or a context-based predictor, for...

متن کامل

Improving Processor Performance Through Compiler-Assisted Block Reuse

Superscalar microprocessors currently power the majority of computing machines. These processors are capable of executing multiple independent instructions in each clock cycle by exploiting the Instruction-Level Parallelism (ILP) available in programs. Theoretically, there is a considerable amount of ILP available in most programs. However, the actual amount of exploitable ILP within a fixed in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998